Efficient Parallel And Serial Approximate String Matching
نویسندگان
چکیده
Consider the string matching problem, where differences between characters of the pattern and characters of the text are allowed. Each difference is due to either a mismatch between a character of the text and a character of the pattern or a superfluous character in the text or a superfluous character in the pattern. Given a text of length n, a pattern of length m and an integer k, we present parallel and serial algorithms for finding all occurrences of the pattern in the text with at most k differences. The first part of the parallel algorithm consists of analysis of the pattern and takes O(log m) time using m 2 processors. The rest of the algorithm consists of handling the text. The text handling part applies the following new approach. This part starts by obtaining a concise characterization of the text which is based solely on substrings of the pattern in O(log m) time using n / log m processors. Then the desired output is derived from this characterization together with the tables built in the first part in O(k) time using n processors. The serial algorithm follows also this new approach for handling the text. It runs in O(kn) time for alphabet whose size is fixed. For general input the algorithm requires O(n(k + log m) ) time. In both cases the space requirement is O(n).
منابع مشابه
Data structures and algorithms for approximate string matching
This paper surveys techniques for designing efficient sequential and parallel approximate string matching algorithms. Special attention is given to the methods for the construction of data structures that efficiently support primitive operations needed in approximate string matching.
متن کاملApproximate Multiple Pattern String Matching using Bit Parallelism: A Review
String matching is to find all the occurrences of a given pattern in a large text both being sequence of characters drawn from finite alphabet set. Approximate String Matching involves the detection of correct patterns along with the detection of some wrong patterns inside the text. Bit Parallelism is a feature that can be used to detect patterns inside the text and is reported to result in mor...
متن کاملA Parallel Algorithm for Fixed-Length Approximate String-Matching with k-mismatches
This paper deals with the approximate string-matching problem with Hamming distance. The approximate string-matching with kmismatches problem is to find all locations at which a query of length m matches a factor of a text of length n with k or fewer mismatches. The approximate string-matching algorithms have both pleasing theoretical features, as well as direct applications, especially in comp...
متن کاملTighter Packed Bit-Parallel NFA for Approximate String Matching
We propose a new variant of the bit-parallel NFA of Baeza-Yates and Navarro (BPD) for approximate string matching [1]. Given a length-m pattern and an error threshold k, the original BPD uses (m−k)(k +2) bits of space. We decrease this to (m− k)(k +1), and also give a slightly more efficient simulation algorithm for the NFA. In experiments our modified NFA is often noticeably more efficient tha...
متن کاملFamilies of FPGA-based accelerators for approximate string matching
Dynamic programming for approximate string matching is a large family of different algorithms, which vary significantly in purpose, complexity, and hardware utilization. Many implementations have reported impressive speed-ups, but have typically been point solutions - highly specialized and addressing only one or a few of the many possible options. The problem to be solved is creating a hardwar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1986